603 research outputs found
A Direct Estimation Approach to Sparse Linear Discriminant Analysis
This paper considers sparse linear discriminant analysis of high-dimensional
data. In contrast to the existing methods which are based on separate
estimation of the precision matrix \O and the difference \de of the mean
vectors, we introduce a simple and effective classifier by estimating the
product \O\de directly through constrained minimization. The
estimator can be implemented efficiently using linear programming and the
resulting classifier is called the linear programming discriminant (LPD) rule.
The LPD rule is shown to have desirable theoretical and numerical properties.
It exploits the approximate sparsity of \O\de and as a consequence allows
cases where it can still perform well even when \O and/or \de cannot be
estimated consistently. Asymptotic properties of the LPD rule are investigated
and consistency and rate of convergence results are given. The LPD classifier
has superior finite sample performance and significant computational advantages
over the existing methods that require separate estimation of \O and \de.
The LPD rule is also applied to analyze real datasets from lung cancer and
leukemia studies. The classifier performs favorably in comparison to existing
methods.Comment: 39 pages.To appear in Journal of the American Statistical Associatio
Adaptive Thresholding for Sparse Covariance Matrix Estimation
In this paper we consider estimation of sparse covariance matrices and
propose a thresholding procedure which is adaptive to the variability of
individual entries. The estimators are fully data driven and enjoy excellent
performance both theoretically and numerically. It is shown that the estimators
adaptively achieve the optimal rate of convergence over a large class of sparse
covariance matrices under the spectral norm. In contrast, the commonly used
universal thresholding estimators are shown to be sub-optimal over the same
parameter spaces. Support recovery is also discussed. The adaptive thresholding
estimators are easy to implement. Numerical performance of the estimators is
studied using both simulated and real data. Simulation results show that the
adaptive thresholding estimators uniformly outperform the universal
thresholding estimators. The method is also illustrated in an analysis on a
dataset from a small round blue-cell tumors microarray experiment. A supplement
to this paper which contains additional technical proofs is available online.Comment: To appear in Journal of the American Statistical Associatio
A Direct Estimation Approach to Sparse Linear Discriminant Analysis
This article considers sparse linear discriminant analysis of high-dimensional data. In contrast to the existing methods which are based on separate estimation of the precision matrix Ω and the difference δ of the mean vectors, we introduce a simple and effective classifier by estimating the product Ωδ directly through constrained ℓ1 minimization. The estimator can be implemented efficiently using linear programming and the resulting classifier is called the linear programming discriminant (LPD) rule. The LPD rule is shown to have desirable theoretical and numerical properties. It exploits the approximate sparsity of Ωδ and as a consequence allows cases where it can still perform well even when Ω and/or δ cannot be estimated consistently. Asymptotic properties of the LPD rule are investigated and consistency and rate of convergence results are given. The LPD classifier has superior finite sample performance and significant computational advantages over the existing methods that require separate estimation of Ω and δ. The LPD rule is also applied to analyze real datasets from lung cancer and leukemia studies. The classifier performs favorably in comparison to existing methods
Large-Scale Multiple Testing of Correlations
Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity analysis. In this article, we consider large-scale simultaneous testing for correlations in both the one-sample and two-sample settings. New multiple testing procedures are proposed and a bootstrap method is introduced for estimating the proportion of the nulls falsely rejected among all the true nulls. We investigate the properties of the proposed procedures both theoretically and numerically. It is shown that the procedures asymptotically control the overall false discovery rate and false discovery proportion at the nominal level. Simulation results show that the methods perform well numerically in terms of both the size and power of the test and it significantly outperforms two alternative methods. The two-sample procedure is also illustrated by an analysis of a prostate cancer dataset for the detection of changes in coexpression patterns between gene expression levels. Supplementary materials for this article are available online
A Constrained L1 Minimization Approach to Sparse Precision Matrix Estimation
A constrained L1 minimization method is proposed for estimating a sparse
inverse covariance matrix based on a sample of iid -variate random
variables. The resulting estimator is shown to enjoy a number of desirable
properties. In particular, it is shown that the rate of convergence between the
estimator and the true -sparse precision matrix under the spectral norm is
when the population distribution has either exponential-type
tails or polynomial-type tails. Convergence rates under the elementwise
norm and Frobenius norm are also presented. In addition, graphical
model selection is considered. The procedure is easily implementable by linear
programming. Numerical performance of the estimator is investigated using both
simulated and real data. In particular, the procedure is applied to analyze a
breast cancer dataset. The procedure performs favorably in comparison to
existing methods.Comment: To appear in Journal of the American Statistical Associatio
Two-Sample Covariance Matrix Testing and Support Recovery
This paper proposes a new test for testing the equality of two covariance matrices Σ1 and Σ2 in the high-dimensional setting and investigates its theoretical and numerical properties. The limiting null distribution of the test statistic is derived. The test is shown to enjoy certain optimality and to be especially powerful against sparse alternatives. The simulation results show that the test significantly outperforms the existing methods both in terms of size and power. Analysis of prostate cancer datasets is carried out to demonstrate the application of the testing procedures. When the null hypothesis of equal covariance matrices is rejected, it is often of significant interest to further investigate in which way they differ. Motivated by applications in genomics, we also consider two related problems, recovering the support of Σ1 − Σ2 and testing the equality of the two covariance matrices row by row. New testing procedures are introduced and their properties are studied. Applications to gene selection is also discussed
Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions
This paper proposes a new method for estimating sparse precision matrices in
the high dimensional setting. It has been popular to study fast computation and
adaptive procedures for this problem. We propose a novel approach, called
Sparse Column-wise Inverse Operator, to address these two issues. We analyze an
adaptive procedure based on cross validation, and establish its convergence
rate under the Frobenius norm. The convergence rates under other matrix norms
are also established. This method also enjoys the advantage of fast computation
for large-scale problems, via a coordinate descent algorithm. Numerical merits
are illustrated using both simulated and real datasets. In particular, it
performs favorably on an HIV brain tissue dataset and an ADHD resting-state
fMRI dataset.Comment: Maintext: 24 pages. Supplement: 13 pages. R package scio implementing
the proposed method is available on CRAN at
https://cran.r-project.org/package=scio . Published in J of Multivariate
Analysis at
http://www.sciencedirect.com/science/article/pii/S0047259X1400260
- …